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CN , Abstract 

^ ' We develop a Recursive £i -Regularized Least Squares (SPARLS) algorithm for the estimation 

', of a sparse tap-weight vector in the adaptive filtering setting. The SPARLS algorithm exploits noisy 

observations of the tap-weight vector output stream and produces its estimate using an Expectation- 
Maximization type algorithm. Simulation studies in the context of channel estimation, employing multi- 
Q ' path wireless channels, show that the SPARLS algorithm has significant improvement over the con- 

ventional widely-used Recursive Least Squares (RLS) algorithm, in terms of both mean squared error 
. (MSB) and computational complexity. 

cn 

O ■ I- Introduction 

Q ■ Adaptive filtering is an important part of statistical signal processing, which is highly appealing 

a^ : 

O ■ in estimation problems based on streaming data in environments with unknown statistics [8]. 



X 



In particular, it is widely used for echo cancellation in speech processing systems and for 
equalization or channel estimation in wireless systems. 

A wide range of signals of interest admit sparse representations. Furthermore various input 
output systems are described by sparse models. For example, the multi-path wireless channel 
has only a few significant components [2] . Other examples include echo components of sound in 
indoor environments and natural images. However, the conventional adaptive filtering algorithms, 
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such as Least Mean Squares (LMS) and Recursive Least Squares (RLS) algorithms, which 
are widely used in practice, do not exploit the underlying sparseness in order to improve the 
estimation process. 

There has been a lot of focus on the estimation of sparse signals based on noisy observations 
among the researchers in the fields of signal processing and information theory (Please see 
[1], [3], [4], [5], [9], [11] and [13]). Although the above-mentioned works contain fundamental 
theoretical results, most of the proposed estimation algorithms are not tailored to time varying 
environments with real time requirements; they suffer from high complexity and are not appro- 
priate for implementation purposes. 

Recently, Bajwa et. al [2] used the Dantzig Selector (presented by Candes and Tao [4]) and 
Least Squares (LS) estimates for the problem of sparse channel sensing. Although the Dantzig 
Selector and the LS method produce sparse estimates with improved MSB, they do not exploit 
the sparsity of the underlying signal in order to reduce the computational complexity. Moreover, 
they are not appropriate for the setting of streaming data. 

In this paper, we introduce a Recursive £i -Regularized Least Squares (SR^.RLS) algorithm 
for adaptive filtering setup. The SR\RLS algorithm is based on an Expectation-Maximization 
(EM) type algorithm presented in [6] and produces successive improved estimates based on 
streaming data. Simulation studies show that the SR^RLS algorithm significantly outperforms 
the RLS algorithm both in terms of MSE and computational complexity for static and time- 
varying sparse signals. In particular, for estimating a time- varying Rayleigh fading wireless 
channel with 5 nonzero coefficients, the SR\RLS gains about 7dB over the RLS algorithm in 
MSE and has about 70% less computational complexity. 

The outline of the paper is as follows: We will introduce the notation in Section UIl The 
adaptive filtering setup is discussed in Section Hill We will explain the regularized cost function 
in Section |IVl An efficient algorithm to optimize the regularized cost function, namely Low- 
Complexity Expectation Maximization (LCEM) is introduced in Section |Vl We will formally 
define the SR^RLS algorithm along with complexity analysis and related discussions in Section 



January 6, 2009 



DRAFT 



3 



rvTl Simulation studies are presented in Section IVIIl followed by conclusion in Section IVIIII 

II. Notation 

Let X be a vector in C^. We define the Cq quasi-norm of x as follows: 

||x||o = |{x.|x, ^0}| (1) 

A vector x G C^^ is called sparse, if ||x||o ^ M. Let A be a matrix in C^^*^ and J C 
{1, 2, ■ ■ ■ , M} be an index set. We denote the sub-matrix of A with columns corresponding to 
the index set J7 by Aj. Similarly, we denote the sub-vector of x G C^^ corresponding to the 
index set J hy Xj. We denote the conjugate transpose of A G C^^^^ and x G C^^ by A* and 
X*, respectively. We also define the element-wise magnitude and signum operators as follows: 

|x| := Ixal, ■ ■ ■ , \xm\T (2) 

and 

sgn(x) := [sgn(xi), sgn(x2), ■ ■ ■ , sgn(xM)]'^ (3) 

for X G M^^, where 

f 1 a;^ > 

sgn(a;i) := < (4) 
1-1 Xi < 

For X, y G C*^, we define the element-wise multiplication as follows: 

X ■ y := [xiyi, X2?/2, ■ ■ ■ , xmVmY (5) 

For any x G R^^, we define 

x+ := [(xi)+, (X2)+, ■ ■ ■ , {xm)+Y (6) 

where 

{Xi)+ := max(xi,0) (7) 
Finally, we define the all-one vector in as 

1:=[1,1,---,1]^. (8) 
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x{i) x{i-l) x{i-2) x{i-M+2) x{i-M+l) 




d{i) 

Fig. 1. Adaptive filtering model 

III. Adaptive Filtering Setup 

A. Canonical Adaptive Filtering Setup 

Consider the conventional adaptive filtering setup, consisting of a transversal filter followed 
by an adaptation block (Fig. 1). The tap-input vector at time i is defined by 

x(i) := [x(i),x(i -I),-- - ,x(i-M + l)f (9) 

where x{k) is the input at time k, k = 1, - ■ ■ ,n. The tap-weight vector at time n is defined by 

w(n) := [wQ{n),Wi{n), ■■■ , WM-i{n)f (10) 

Note that the tap-weight vector is assumed to be constant during the observation time 1 < i < n. 
The output of the filter at time i is defined by 

— w*(n)x(i) (11) 

The tap-weight vector w(n) is updated by the adaptation block in order to optimize a certain cost 
function. Let d{i) be the desired output of the filter at time i. We can define the instantaneous 
error of the filter by 

e(i) := d(i) - y{i) = d{i) - w*(n)x(i) (12) 
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The operation of the adaptation block at time n can therefore be stated as the following opti- 
mization problem: 

mill /(e(l),e(2),--- ,e(n)) (13) 

w(n) 

where / > is a certain cost function. The adaptation block exploits e(i), i = 1,2, ■ ■ ■ ,n in 
order to adjust yv{n). Note that there is no constraint on the desired output d{i) and / so far. 
With appropriate choices for d(i) and /, one can recast problems such as channel estimation, 
echo cancellation and equalization in the canonical form given by Eq. (fT3l) . 

In particular, if d{i) is generated by an unknown tap-weight w(n), i.e., d{i) = w*{n)x(i), with 
an appropriate choice of /, one can possibly obtain a good approximation to w{n) by solving 
the optimization problem given in (fT3l) . This is, in general, an estimation problem and is the 
topic of interest in this papeiQ. 

B. Examples of Conventional Cost Functions 

There are various choices for / which give rise to certain approximations of the unknown 
vector w{n). For example, one can choose / to be 

fLMs{e{l),e{2),--- ,e{n)) := \e{n)\^ (14) 

which is the squared error at time n. Solving the optimization problem given in Eq. (fT3l) for 
fiMS will result in the well-known Least Mean Squares (LMS) algorithm [8]. Note that /lms 
is memoryless, i.e., it only depends on e(n). Another choice for / can be 

n 

/M(e(l),e(2),---,e(n)) :=Y,f3{t,n)\e{t)\' (15) 

i=l 

where P{i,n), i = 1,2,--- ,n are non-negative constants, introducing memory to the cost 
function. Two useful weighting sequences are the sliding window, where P{i,n) is zero prior to 
a positive integer / < n, and the exponentially decaying sequence 

(3{i,n) = X""-' (16) 

'Our discussion will focus on single channel complex valued signals. The extension to the multi-variable case presents no 
difficulties. 
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for z = 1, 2, 



, n and A a non-negative constant. Using the latter weighting sequence, the cost 



function takes the form 



fnLs{eil),e{2),... ,e{n)) :=^A'^-|e( 



(17) 



i=l 



( 



The parameter A is commonly referred to as forgetting factor. The solution to the optimization 
problem in Eq. (fT3l) with Jrls gives rise to the well-known Recursive Least Squares (RLS) 
algorithm. It is well known [8] that RLS enjoys faster convergence than LMS (where A = 0). 
The cost function f^LS given in (flTI) corresponds to a least squares identification problem. Let 




d(n):= K(l),rf*(2),- 

and X(n) be an n x M matrix whose ith row is x*(i), i.e., 

( xH\\ ■■■ \ 



Dfn) := 



v 





: 

... ly 



(18) 



(19) 



X(n) := 

x\n~\) x*(n-2) ... x*(n-M) 

\ x*(n) x*(n-l) ... x*(n-M + l) / 
The cost function can be written in the following form: 

/^z.5(e(l),e(2),... ,e(n)) = \\i^l\n)^{n) -\i^l\n)-^{n)^{n)\\ 



(20) 



(21) 



where Xy^l'^in) is a diagonal matrix with entries D^l'^i^ 



n] 



Diiin). Thus, the solution to the 



optimization problem with the cost function fjus can be expressed in terms of the following 
normal equations [8]: 

$(n)w(n) = z{n) (22) 



where 



and 



#(n) := X*(n)D(n)X(n) = ^ A"-*x(i)x* 



1=1 



z(n. 



:= X*{n)D{n)d{n) = ^ A"''x(z)ci* 



(23) 



(24) 



i=l 
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IV. Regularized Cost Function 

A. Noisy Observations 

The canonical form of the problem typically assumes that the input-output sequences are 
generated by a time varying system with parameters represented by w{n). In most applications 
however, stochastic uncertainties are also present. Thus a more pragmatic data generation process 
is described by the noisy model 

d{i) = w*{n)x{{} + r]{i) (25) 

where r]{i) is the observation noise. Note that w(n) reflects the true parameters which vary 
with time in a piecewise constant manner. The noise will be assumed to be i.i.d. Gaussian, i.e., 
r](i) ~ A/'(0, 0"^). The estimator has only access to the streaming data x{i) and d{i). 

B. Estimation of Sparse Vectors 

A wide range of interesting estimation problems deal with the estimation of sparse vectors. 
Many signals of interest can naturally be modeled as sparse. For example, the wireless channel 
usually has a few significant multi-path components. One needs to estimate such signals for 
various purposes. Suppose that ||w(r7,)||o = L <^ M. A sparse approximation to w{n) can be 
obtained by solving the following optimization problem: 

min||w(n)||o s.t. /(e(l), e(2), ■ ■ ■ , e(n)) < e (26) 

w(n) 

where e is a positive constant controlling the cost error in (fT3l) . The above optimization problem 
is computationally intractable. A considerable amount of recent research in statistical signal 
processing is focused on efficient estimation methods for estimating an unknown sparse vector 
based on noiseless/noisy observations (Please see [3], [4], [5], [7] and [9]). In particular, convex 
relaxation techniques provide a viable alternative, whereby the Cq quasi-norm in (|26l) is replaced 
by the convex £i norm so that (|26l) becomes 

min||wH||i s.t. /(e(l),e(2),--- ,eH) < e (27) 

w(n.) 

A convex problem results when / is convex, as in the RLS case. The Lagrangian formulation 
shows that if / = fuLS, the optimum solution can be equivalently derived from the following 
optimization problem 

min IAtIIdi/^^^NjI^^) _ 1)1/2. xx(n)w(n)||J + 7||w ^28) 
w(n) [2a'''' ""^ J 
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7 represents a trade off between estimation error and sparsity of the parameter coefficients. 
Sufficient as well as necessary conditions for the existence and uniqueness of a global minimizer 
are derived in [11]. These conditions require that the input signal must be properly chosen so that 
the matrix T>^^'^{n)X{n) is sufficiently incoherent. Suitable probing signals for exact recovery in 
a multi-path environment are analyzed in [2]. 

V. Low-Complexity Expectation Maximization Algorithm 



The convex program in Eq. (1281) can be solved with the conventional convex programming 
methods. Here, we adopt an efficient solution presented by Nowak [6] in the context of Wavelet- 
based image restoration, which we will modify to an online and adaptive setting. Consider the 
noisy observation model: 

d{n) = X{n)w{n) + r){n). (29) 

where T]{n) ~ A/'(0, a^I), with the following cost function 

^\\D'/\n)din) - Di/2(n)x(n)wH + 7||wH ||i (30) 

= ^ {d{n) - X(n)w(r2) j *D(n) {d{n) - X(n)w(n) j + 7|| w(n) || 1 

If we consider the alternative observation model: 

d(n) = X(n)w(n) + i{n). (31) 

with ^(n) ~ A/'(0, cr^D~^(n)), the convex program in Eq. (|28l) can be identified as the following 
Maximum Likelihood (ML) problem: 

max < logp(d(n)|w(r2)) — 7||w(n)||i [ (32) 

where p(d(n)|w(?7,)) := A/'(X(n)w(n), o'^D^^(ra)). This ML problem is in general hard to solve. 
The clever idea of [6] is to decompose the noise vector ^{n) in order to divide the optimization 
problem into a denoising and a filtering problem. We adopt the same method with appropriate 
modifications for the cost function given in Eq. (|32|) . Consider the following decomposition for 

iin): 

i{n) = aX{n)iM^iikr^) (33) 
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Fig. 2. Soft thresholding function 



where ~ Ar(0, 1) and ~ Ar(0, (T2D-^(n) - a^X{n)X* (n)) . We need to choose 

< (T^/si, where si is the largest eigenvalue of X(n)X*(n), in order for ^2("^) to have a 
positive semi-definite covariance matrix. We can therefore rewrite the model in Eq. (|3TI) as 

\(n) = w(n) + 1 (n) 
d(?2) = X(n)v(?2) + 

The Expectation Maximization (EM) algorithm can be used to solve the ML problem of (|32l) . 
with the help of the following ML problem 

max < logp(d(ra), v(r;,)|w(r;,)) — 7||w(r;,)||i L (35) 

w(n) I J 

which is easier to solve. The ith iteration of the EM algorithm is as follows: 

rW(n) = (I - ^X*{n)D{n)X{n))\v^^\n) + ^X* {n)D{n)d{n) 



w(^+i)(n) = sgn (^rW(r2)) ■ (|rW(r2)| - ^aH^ 



(36) 



The function sgn(x)(|x| — 7tt^)^ is denoted by soft thresholding function and is plotted in Fig. 
2. 

It is known that the EM algorithm given by Eq. (|36l) converges [12]. Note that the soft 
thresholding function tends to decrease the support of the estimate w(n), since it will shrink the 
support to those elements whose absolute value is greater than 70;^. We can use this observation 
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to express the double iteration given in Eq. (l36l) in a low complexity fashion. Let X*^^^ be the 
support of r^^\n) at the £th iteration. Let 

jW:={z:rf)(n)<-7a2}cjW ' 

„2 



a 



B(n) := I - — X*(n)D(n)X(n), 



(37) 



(38) 
(39) 



and 



um 



X*('n)D('r2)d('n) 



Note that the second iteration in Eq. (|36l) can be written as 



for i = 1,2, 



^(f+l)/ N 



M. We then have 



W/ \ 2 ■ ^ T-W 



which allows us to express the EM iteration as follows: 



(40) 



(41) 



B(n)w('+i)(n) = ^^,,M)[^%{n) - ^aH^,,,) ^-^^^..^^^^^ (42) 



s(^+i)(n) = B^to (n) (s^,) (n) + u^w(n) - 7a2l^(f)) + B^^ (n) (s4) (n) + u^w(n) + 7021^^^^) 



.(£+1) _ 



= s(^+i)H+u(n) 
J|^-^^ = {z:rf+')(n) >7a2} 
x(_^+i) = {z:rf+^)(n)<-7a2} 



(43) 



This new set of iteration has a lower computational complexity, since it restricts the matrix 
multiplications to the instantaneous support of the estimate v^^^{n\ which is expected to be 
close to the support of w(n) [11]. We denote the iterations given in Eq. (|43] ) by Low-Complexity 
Expectation Maximization (LCEM) algorithm. 
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VI. The SPARLS Algorithm 

A. SPARLS 

Upon the arrival of the nth input, x{ri), the LCEM algorithm computes the estimate w(n) 
given B(n), u(n) and s^^\n). The LCEM algorithm is summarized in Algorithm 1. Note that 
the input argument k denotes the number of iterations. 



Algorithm 1 LCEM (B(n), u(n), s^, A;) 



Inputs: B{n), u(n), 8*^°^ k. 
Outputs: w(n). 
1: r(o) =s(°)+u(n). 



2: i: 
3: t 



(0) 



(0) 



{i : rf (n) > 0}. 
{t : rf (n) < 0}. 
4: for £ = 0, 1, ■ ■ ■ , A; - 1 do 

6: r(^+i)(n) =s(^+i)(n)+u(n). 



7: 



r(^+l) 
'+ 

r{t+l) 



= {i:rf+'^(n) >7q;2}. 



= {i:rf+')(n) <-7q;2}. 



9: end for 



10: for i = 1,2, • • • ,M do 
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Wj(n) 
12: end for 



r(fc) 



r; ' — 70;^ i E X\ 

(fc) , , T-(fc) 







Upon the arrival of the nth input, B(n) and u(n) can be obtained via the following rank-one 
update rules: 

I Bin) = XB{n - 1) - ^x{n)r {n) ^^^^ 
1 u(n) = Au(n - 1) + ^d*{n)x{n) 

The SPARLS algorithm is formally defined in Algorithm 2. Without loss of generality, we can 
set the time index n = 1 such that x(l) 7^ 0, in order for the initialization to be well-defined. 
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Algorithm 2 SPARLS 

Inputs: B(l) = I - f^x(l)x*(l), u(l) = ^x(l)d*(l) and k. 

Output: w(n). 

1: for all Input x{n) do 

2: B(n) = AB(n - 1) - f^x(n)x*(n). 
3: u(n) = Au(n - 1) + ^d*{n)x{n). 
4: Run LCEM (B(n) , u(n) , B(n) w(n) , A;) . 
5: Update w(n). 

6: end for 



B. Complexity Analysis 

The LCEM algorithm requires M[\lf^\ + multiplications at the £th iteration. Thus, for 

a total of k iterations, the number of multiplications will be kMN, where 



For a sparse signal w(n), one expects to have N O(||w(n)||o) = 0{L). Therefore, the 
complexity of the LCEM algorithm is roughly of the order 0{kLM). Simulation results show 
that a single LCEM iteration {k — 1) is sufficient for the SPARLS algorithm to result in significant 
gains in terms of both MSE and computational complexity. Note that the Recursive Least Squares 
(RLS) algorithm requires C(M^) multiplications, which clearly has higher complexity compared 
to the SPARLS. 

C. Discussion of the SPARLS Algorithm 

The parameter a in the SPARLS algorithm must be chosen such that < cr^/si, where Si is 
the largest eigenvalue of X*(n)X(n). For large n, the eigenvalues of X*(n)X(n) will all tend to 

1, given x{i) ~ A/'(0, 1/M) for i = 1,2, ■ ■ ■ ,n. Therefore, by choosing a = a/2, the condition 
of < cr^/ Si is satisfied with overwhelming probability. 

The parameter 7 is an additional degree of freedom which controls the trade-off between 
sparseness of the output (computational complexity) and the MSE. For very small values of 7, 




k-l 



(45) 
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the SPARLS algorithm coincides with the RLS algorithm. For very large values of 7, the output 
will be the zero vector. Thus, there are intermediate values for 7 which result in low MSB and 
sparsity level which is desired. 

The parameter 7 can be fine-tuned according to the application we are interested in. For 
example, for estimating the wireless multi-path channel, 7 can be optimized with respect to the 

number of channel taps (sparsity), temporal statistics of the channel and noise level via exhaustive 
simulations or experiments. Note that 7 can be fine-tuned offline for a certain application. There 
are also some heuristic methods for choosing 7 which are discussed in [6]. 

VII. Simulation Studies 

We consider the estimation of a sparse multi-path wireless channel generated by the Jake's 
model [10]. In the Jake's model, each component of the tap-weight vector is a sample path of 
a Rayleigh random process with autocorrelation function given by 

R{n) = Jo(27rn/,r,) (46) 

where Jo(-) is the zeroth order Bessel function, fa is the Doppler frequency shift and Tg is 
the channel sampling interval. The dimensionless parameter f^Ts gives a measure of how fast 
each tap is changing over time. Note that the case fa — corresponds to a constant tap-weight 
vector. Thus, the Jake's model covers constant tap-weight vectors as well. For the purpose of 
simulations, Tg is normalized to 1. 

In order to compare the performance of the SPARLS and RLS algorithms, we first need to 
optimize the RLS algorithm for the given time-varying channel. By exhaustive simulations, the 
optimum forgetting factor. A, of the RLS algorithm can be obtained for various choices of SNR 
and fd. The optimal values of A for several choices of SNR and fa are summarized in Table 1. 

As for the SPARLS algorithm, we perform a partial optimization as follows: we use the 
values of Table 1 for A and optimize over 7 with exhaustive simulations. The optimal values 
of 7 are summarized in Table 2. Note that with such choices of parameters A and 7, we are 
comparing a near-optimal parametrization of SPARLS with the optimal parametrization of RLS. 
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TABLE I 

Optimal values of A, given cP' and /d, for the RLS algorithm. Each entry has an error of ±0.01. 








0.0001 


0.0005 


0.001 


0.005 


0.01 


0.0001 


0.98 


0.95 


0.95 


0.99 


0.99 


0.99 


0.0005 


0.99 


0.97 


0.98 


0.99 


0.99 


0.99 


0.001 


0.99 


0.97 


0.98 


0.99 


0.99 


0.99 


0.005 


0.99 


0.99 


0.99 


0.99 


0.99 


0.99 


0.01 


0.99 


0.99 


0.99 


0.99 


0.99 


0.99 


0.05 


0.99 


0.99 


0.99 


0.99 


0.99 


0.99 



TABLE n 

Optimal values of 7, given , A and /d, for the SPARES algorithm. Each entry has an error of ±5. 








0.0001 


0.0005 


0.001 


0.005 


0.01 


0.0001 


100 


100 


100 


100 


100 


100 


0.0005 


45 


40 


40 


60 


50 


50 


0.001 


30 


25 


30 


25 


25 


25 


0.005 


15 


15 


10 


10 


10 


10 


0.01 


10 


10 


5 


5 


5 


5 


0.05 


5 


5 


3 


2 


2 


2 



The perforaiance of the SPARLS can be further enhanced by simultaneous optimization over 
both A and 7. 

We compare the performance of the SPARLS and RLS with respect to two performance 
measures. The first measure is the MSB defined as 

E|l|w - wll^l 

MSB J[^^ (47) 

where the averaging is carried out by 50000 Monte Carlo samplings. The number of samples 

was chosen large enough to ensure that the uncertainty in the measurements is less than 1%. 

The second measure is the computational complexity ratio (CCR) which is defined by 

average number of multiplications for SPARLS 

LCR (4o} 

average number of multiplications for RLS 
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In all simulations the input data x{i) is i.i.d. and distributed according to A/'(0, 1/M). The SNR 
is also defined as E{||w||2}/cr^, where is the variance of the Gaussian zero-mean observation 
noise. The locations of the nonzero elements of the tap-weight vector are randomly chosen in 
the set {1, 2, ■ ■ ■ , M} and the SPARLS algorithm has no knowledge of these locations. Also, 
all the simulations are done with k = 1, i.e., a single LCEM iteration per new data. Finally, a 
choice of a = cr/2 has been used (Please see Section IVI-CI) . 

Figures 3 and 4 show the mean squared error and computational complexity ratio of the 
RLS and SPARLS algorithms for fa = 0, 0.0001, 0.0005, 0.001, 0.005 and 0.01, with L = 5 and 
M = 100, respectively. The SPARLS algorithm outperforms the RLS algorithm with about 7 
dB gain in the MSE performance. Moreover, the computational complexity of the SPARLS is 
about 70% less than that of RLS. 

VIIL Conclusion 

We have developed a Recursive £i -Regularized Least Squares (SPARLS) algorithm for the 
estimation of a sparse tap-weight vector in the adaptive filtering setting. The SPARLS algo- 
rithm estimates the tap-weight vector based on noisy observations of the output stream, using 
an Expectation-Maximization type algorithm. Simulation studies, in the context of multi-path 
wireless channel estimation, show that the SPARLS algorithm has significant improvement over 
the conventional widely-used Recursive Least Squares (RLS) algorithm, in terms of both mean 
squared error (MSE) and computational complexity. 
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